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Abstract 

This  paper  presents  a  model-based  algorithm  for  tracking  feature  points  over  a  long  sequence 
of  monocular  noisy  images  with  the  ability  to  include  new  feature  points  detected  in  successive 
frames.  The  trajectory  for  each  feature  point  is  modeled  by  a  simple  kinematic  motion  model.  A 
Probabilistic  Data  Association  Filter  is  first  designed  to  estimate  the  motion  between  two  consecu¬ 
tive  frames.  A  matching  algorithm  then  identifies  the  corresponding  point  to  subpixel  accuracy  and 
an  Extended  Kalman  Filter  (EKF)  is  employed  to  continually  track  the  feature  point.  An  efficient 
way  to  dynamically  include  new  feature  points  from  successive  frames  into  a  tracking  list  is  also 
addressed.  Tracking  results  for  several  image  sequences  are  given. 


ft 


ft 


ft 


ft  • 


ft 


ft 


94-30975 

llilllllii! 


The  support  of  the  Advanced  Research  Projects  Agency  (ARPA  Order  No.  A422)  and  the  Army  Research  Office 
under  Grant  DAAH-0493G0419  is  gratefully  acknowledged,  as  is  the  help  of  Sandy  German  in  preparing  this  paper. 


ft 


ft 


a* 


1  Introduction 

Motion  estimation  has  been  an  important  topic  in  the  field  of  computer  vision  for  more  than  a 
decade.  Based  on  matches  of  a  few  discrete  features  such  as  points  and  lines  over  two  or  three 
frames,  many  algorithms  tor  estimating  the  motion  of  the  camera  and  the  structure  of  the  feature 
points  have  been  proposed.  Although  linear  algorithms  result  when  two  or  three  frames  are  used, 
high  sensitivity  of  the  estimates  to  input  errors  has  been  observed  [1,  2,  11,  13].  In  the  meantime, 
the  robustness  of  approaches  that  use  a  sequence  of  images  has  attracted  the  attention  of  many 
researchers  [6,  17,  18,  19].  The  issue  of  finding  feature  correspondences  over  a  long  sequence  of 
images  needs  to  be  addressed  in  such  approaches. 

Besides  manual  tracking  algorithms,  existing  techniques  for  tracking  a  set  of  discrete  features 
over  a  sequence  of  images  generally  fall  into  two  categories:  two-frame  based  and  long-sequence 
based. 

(1)  Two- frame  based  approaches:  In  this  category,  finding  feature  correspondences  over  a  se¬ 
quence  of  images  is  broken  into  successive  problems  of  two- view  matching.  For  example,  in 
[16],  Weng,  Ahuja  and  Huang  used  multiple  attributes  of  each  image  point  such  as  intensity, 
edgeness  and  comerness  which  are  invariant  under  rigid  motion  in  the  image  plane  along  with 
a  set  of  constraints  to  compute  a  dense  displacement  field  and  occlusion  areas  in  two  images. 
Cui,  Weng  and  Cohen  [9]  then  used  an  intensity-based  cross-correlation  method  to  refine  the 
two-view  matching  results  and  obtain  feature  point  correspondences  over  the  sequence.  In 
[21],  Zheng  and  Chellappa  first  apply  an  image  registration  technique  to  compensate  for  the 
motion  of  the  camera  between  two  consecutive  frames.  Feature  point  correspondence  prob¬ 
lems  are  then  solved  by  repeatedly  identifying  the  matching  points  to  subpixel  accuracy  using 
the  correlation  matching  method. 

(2)  Long-sequence  based  approaches:  In  this  category,  smoothness  constraints  are  employed  to 
exploit  the  temporal  information  existing  in  the  sequence.  For  example,  assuming  that  the 
motion  of  an  object  does  not  change  abruptly,  Sethi  and  Jain  [15]  formulated  the  correspon¬ 
dence  problem  as  an  optimization  problem.  The  trajectories  of  a  set  of  feature  points  are 
obtained  by  searching  for  a  set  of  trajectories  each  of  which  has  maximal  smoothness.  Blostein 
and  Huang  [5]  used  Multistage  Hypothesis  Testing  (MHT)  to  detect  small  moving  objects 
in  each  image;  a  feature  trajectory  is  determined  by  repeatedly  detecting  the  same  feature 
point  over  the  sequence.  Chang  and  Aggarwal  [7]  assumed  a  2-D  kinematic  motion  model 
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and  applied  Joint  Probabilistic  Data  Association  (JPDA)  to  track  line  segments,  with  the 
ability  to  initiate  or  terminate  the  trajectory  of  a  line  segment.  Employing  a  3-D  kinematic 
motion  model  and  a  Mahalanobis  distance  based  matching  criterion,  Zhang  and  Faugeras  [20] 
applied  an  Extended  Kalman  Filter  (EKF)  to  track  a  set  of  line  segments.  A  fading  memory 
type  statistical  test  was  suggested  to  take  into  account  the  occlusion  and  disappearance  of 
line  segments. 

In  this  paper,  a  long  sequence  based  approach  is  proposed.  Finding  the  trajectory  of  a  feature 
point  over  a  sequence  of  images  is  formulated  as  a  recursive  state  estimation  and  measurement  iden¬ 
tification  problem.  A  discrete  2-D  constant  translational  and  rotational  motion  model  is  adopted  to 
describe  the  motion  of  every  feature  point.  Using  the  information  in  other  feature  points  detected 
in  subsequent  images,  a  Probabilistic  Data  Association  Filter  (PDAF)  [4]  is  employed  to  estimate 
the  motion  parameters  between  two  consecutive  frames.  However,  because  of  the  imperfect  fea¬ 
ture  detection  algorithm,  a  local  image  interpolation  technique  combined  with  weighted  correlation 
matching,  an  image  differential  method,  and  interpolation  of  pixel  locations  are  used  to  identify  the 
matching  point  to  subpixel  accuracy.  After  the  identification  of  corresponding  points,  an  EKF  is 
applied  to  refine  the  estimates  of  the  motion  parameters.  In  addition,  to  maintain  a  certain  number 
of  feature  points  on  the  tracking  list,  the  dynamic  inclusion  of  new  feature  points  extracted  from 
successive  frames  is  also  considered.  We  have  tested  the  feature  tracking  algorithm  on  several  real 
image  sequences  commonly  employed  in  motion  estimation.  Due  to  space  limitations,  we  present 
results  only  on  four  of  these  sequences. 

The  organization  of  the  paper  is  as  follows.  Section  2  presents  an  algorithm  for  tracking  a 
dynamic  set  of  feature  points.  The  tracking  results  for  four  real  image  sequences  are  shown  in 
Section  3.  Conclusions  are  presented  in  Section  4. 

2  Feature  Point  Tracking 

In  this  section,  an  algorithm  for  tracking  a  dynamic  set  of  feature  points  is  presented.  The  motion 
model  for  a  feature  point  moving  over  a  sequence  is  first  formulated.  A  scheme  for  estimating  the 
motion  between  two  consecutive  frames  and  procedures  for  identifying  the  matching  points  are  then 
suggested.  The  issue  of  the  inclusion  of  new  feature  points  is  addressed  afterwards. 
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2.1  Motion  Model 


$ 


To  model  the  motion  of  a  feature  point  over  a  sequence  of  images,  a  coordinate  system  xyt  shown 
in  Figure  1  is  first  established  with  the  origin  coinciding  with  the  center  of  the  first  image  and  the 
x-y  plane  parallel  to  the  image  plane  at  each  time  instant.  Then,  assuming  that  the  image  center 
of  each  frame  is  located  on  the  t-axis,  the  coordinates  of  the  center  of  the  kth  image  are  (0,0,tjt)T- 
The  state  vector  for  a  feature  point  at  time  tk  is  therefore  defined  as  follows: 
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<D 


x(k)  =  [x(fc)  y(k)  vx(k)  vy(k)  0(k)]T 


(1) 


where  (x(k),y(k),tk)T  are  .the  coordinates  of  a  feature  point  in  the  kth  image,  (vx(k),vy(k))T  is 
the  associated  translational  velocity  along  the  (x,y)  direction,  and  8(k)  is  the  rotation  angle  from 
the  ( k  -  l)st  image  to  the  fcth  image.  We  describe  the  motion  model  of  a  feature  point  as  well  as 
the  relationship  between  the  associated  state  vector  and  the  image  plane  coordinates  in  the  form 
of  plant  and  measurement  equations  in  the  following. 


Under  the  assumption  that  a  feature  point  moves  with  constant  translation  and  rotation  over  the 
sequence,  the  plant  equation  for  the  recursive  tracking  algorithm  can  be  written  as 
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x{k  +  1)  =  l[x(k)]  +  w(k  +  1) 


(2) 
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where 


^  x(k)  cos  0(k)  -  y(k)  sin  $(k)  +  vx(k)T  ^ 


x(k)  sin  9(k)  +  y(k)  cos  0(k)  +  vy(k)T 

/[*(*)]  =  |  «,(*)  |  (3) 

Vy(k) 

\  «(*) 

and  the  plant  noise  w(k )  is  assumed  to  be  zero  mean  with  covariance  matrix  Q(k).  The  addition  of 
the  plant  noise  takes  into  account  possible  deviations  of  the  true  motion  from  the  assumed  simple 
model.  The  time  interval  between  two  consecutive  frames  is  assumed  to  be  T. 


) 


B.  The  Measurement  Equations 


For  each  feature  point,  the  measurements  used  for  the  corresponding  recursive  tracking  filter  at 
time  tk  consist  of  the  image  plane  coordinates  of  the  corresponding  point  in  the  kth  image.  Thus, 
the  measurement  is  related  to  the  state  vector  in  (1)  by 


z(k)  =  Hx(k)  +  n(k) 


(4) 


where 


(  1  0  0  0  0 
\  0  1  0  0  0 


(5) 


and  the  measurement  noise  n(k)  is  assumed  to  be  zero  mean  with  covariance  matrix  R(k). 

After  the  plant  and  measurement  equations  have  been  formulated,  the  EKF  can  be  applied  to 
recursively  estimate  the  motion  between  two  consecutive  frames  and  track  the  feature  point. 


2.2  In- frame  Motion  Estimation 

For  every  feature  point  being  tracked,  to  simplify  the  matching  algorithm  in  identifying  the  cor¬ 
responding  points  in  successive  frames,  the  motion  between  two  consecutive  frames  needs  to  be 
compensated.  In  our  work,  the  PDAF  which  was  originally  proposed  by  Bar-Shalom  [3]  and  used 
for  tracking  a  moving  object  in  a  cluttered  environment  is  applied  to  provide  initial  estimates  of 
the  motion  parameters.  In  the  following,  assuming  that  the  trajectory  of  a  feature  point  has  been 
established  up  to  the  klh  frame  of  a  sequence,  procedures  for  estimating  the  in- frame  motion  be¬ 
tween  the  kth  and  (k  +  1)*‘  frames  are  described.  The  detailed  derivation  of  the  PDAF  can  be 
found  in  [4];  only  a  brief  review  is  given  in  the  Appendix  for  the  sake  of  completeness. 
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First,  for  a  feature  point,  the  predicted  location  of  its  corresponding  point  z(k  + 1  |Jk)  is  obtained 
as  follows: 


cos(0(fc|fc))  —  sin(<?(£|fc)) 
sin(0(fc|fc))  cos(0(k|fc)) 


*(*JA) 

y(k\k) 


®x{k\k) 

Uy(fc)fc) 


(6) 


where  [i(A:|A:),g'(fc|A:)),vr(fcjA:),Vj,(A:|/:),0(l;|fc)]  is  the  estimated  state  vector  at  tk  computed  by  the 
EKF. 


Subsequently,  a  window  centered  at  z(k  +  1|&)  is  extracted  from  the  ( k  +  l)th  image  and  the 
feature  point  extraction  algorithm  reported  in  [14]  is  applied  to  the  window  to  identify  salient 
feature  points.  Since  points  which  are  far  away  from  the  predicted  location  are  less  likely  to  be 
correct,  a  validation  gate  based  on  the  Mahalanobis  distance  [8, 10]  is  constructed  to  select  potential 
measurements.  Specifically,  a  validation  gate  centered  at  z(k  +  l|fc)  and  with  parameter  7  is  defined 
[4,  8]: 

Vfc+1(7)  =  | z:\z-  z(k  +  l|fc)]TS_1(fc  +  l)[z  -  z(k  +  l|fc)]  <  7}  (7) 

where  S(k  +  1)  is  the  covariance  matrix  of  the  innovation  vector  z  -  z(k  +  l|fc),  and  7  decides 
the  scope  of  the  validation  gate  and  can  be  obtained  from  the  chi-square  distribution  table.  A 
set  of  potential  measurements  thus  consists  of  the  extracted  points  whose  distances  are  less  than 
7.  The  PDAF  then  combines  the  information  in  the  potential  measurements  using  (38)  in  the 
Appendix  to  provide  estimates  of  the  motion  parameters  between  the  fcth  and  ( k  +  l)st  images.  For 
convenience,  the  resulting  estimates  are  denoted  by  Vx(fc4T|&  +  l),  V^,(fc+  l|fc+ 1), and  0(fc  +  l|A:  +  l) 
respectively  to  differentiate  them  from  the  outputs  from  the  EKF.  For  feature  points  for  which  none 
of  the  extracted  points  qualify  as  potential  measurements,  the  predicted  motion  parameters  in  (34) 
are  used  instead. 


2.3  Feature  Matching 

After  the  initial  estimates  of  the  motion  parameters  between  the  klh  and  (k  +  l)st  images  have 
been  obtained,  in  order  to  find  corresponding  points  (or  measurements  for  the  EKF),  a  sequence  of 
steps  similar  to  those  in  [21]  is  applied  to  achieve  subpixel  accuracy  matching.  First,  a  local  image 
registration  technique  is  used  to  compensate  for  the  motion  between  two  consecutive  frames.  The 
resulting  compensated  image  is  then  compared  with  the  original  image  and  the  matching  points 
for  the  neighboring  pixels  are  identified  using  weighted  correlation  matching.  However,  because 
of  the  3-D  motion  of  the  camera,  a  verification  procedure  is  employed  to  exclude  some  possible 
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wrong  matches  from  the  correlation  matching  before  applying  the  subpixel  correction  and  location 
interpolation  schemes  to  obtain  the  corresponding  point.  In  this  subsection,  the  scheme  for  finding 
the  corresponding  point  at  the  ( k  +  l)st  frame  is  described  in  detail. 

A.  Window  Interpolation 

Given  two  images  between  which  the  perspective  projection  distortion  is  negligible,  the  accuracy 
of  using  the  intensity-based  correlation  matching  method  to  find  the  matching  point  relies  on 
image  plane  compensation  for  rotation  and  scale  change.  Instead  of  applying  a  global  registration 
technique,  a  local  image  registration  technique  which  considers  small  patches  of  images  in  which 
the  feature  point  appears  is  employed  fo  -  each  feature  point. 

Since  only  small  patches  of  two  images  are  considered  at  any  given  time,  the  scale  change 
in  the  two  windows  is  assumed  to  be  insignificant.  The  more  accurate  predicted  location  of  the 
corresponding  point,  denoted  by  ( x'(k  +  l\k),y'(k  +  l|fc))T,  can  be  obtained  using  the  estimates 
provided  by  the  PDAF  as 

\ 

+ 


/ 


Vx(k+l\k+l) 

Vy(k  +  l|fc  +  1) 


(8) 


cos(0(A:  +  l|fc  +  1))  -  sin(0(fc  4-  l|fc  +  1))  j  x(k) 
sin(0(fc  +  1|A:  +  1))  cos(0(fc  +  1|A:  +  1))  J  ^  y(k) 

where  (x(k),y(k))T  is  the  corresponding  point  in  the  kth  frame. 

Thus,  centering  on  the  predicted  location  and  assuming  that  the  pixels  near  the  predicted 
location  undergo  the  same  motion,  a  window  denoted  by  I\  is  generated  using  back  propagation 
followed  by  bilinear  interpolation  of  the  intensity  function,  i.e.  for  the  point  whose  coordinates  are 
(x(fc+  l),y(fc-|-  1))T  in  the  (k  +  l)st  frame  and  which  belongs  to  /i,  the  corresponding  point  in  the 
kth  frame  is  computed  by 


/ 


cos(0(fc  +  l|fc  +  1))  sin(0(/c  +  l|fc  +  1)) 
-sin(0(Ar  +  1|&  +  1))  cos(0(k  +  l|fc  1)) 


x(fc+  1)-  Vx(k+  l|fc+  I)  > 
y(k  +  1)  -  Vy{k  1| k  +  1)  ) 


(9) 

Because  (x(k),  y(k))T  may  not  be  at  a  grid  point,  the  intensity  of  the  pixel  ( x(k  +1),  y{k  +  1))T  is 
obtained  by  interpolating  the  intensities  of  the  four  nearest  neighbors  of  (x(k),y(k))T: 

g[x(k  +  1),  y(k  +  1)]  =  (1  -  d*)(l  -  dv)gu  +  d*(l  -  dy)gu  +  (1  -  dx)dyg2i  +  dxdyg22  (10) 

where  dx ,  dy  are  the  distances  between  (x(fc),  y(k))T  and  its  neighboring  pixel  ([x(fc)],  [j/(fc)])T,  and 
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{911)512,021  >522}  represent  the  intensities  of  the  four  nearest  neighbors  of  (x(k),  y(k))T,  i.e. 

dx  =  x(k)  -  [x(k)] 
dy  =  y{k )  -  [y(k)] 


0n  =  5([*(*)],  [»(*)]) 

9i2  =  g([x(k)],[y(k)]+  1)  ^ 

921  =  ff([z(k)]  +  1,  [y(k)]) 

922  “  0([*(*)]+ l,[0(fc)]+ 1) 

Note  that  [•]  in  (11)  and  (12)  represents  the  floor  function  which  converts  a  real  number  into  an 
integer. 

B.  Window  Extraction 

As  in  the  procedure  used  in  estimating  in-frame  motion,  for  each  feature  point,  another  window, 
denoted  by  /2,  centered  at  the  predicted  location  of  the  corresponding  point  (z'(Ar+l|it),  y'(k+l\k))T 
in  (8)  is  extracted  from  the  ( k  +  l)st  image.  The  correlation  matching  method  described  below  is 
then  applied  to  /j  and  I2  to  find  the  corresponding  point. 


C.  Correlation  Matching 

Since  the  motion  between  the  time  instants  t*  and  (*+1  has  been  compensated,  a  simple  intensity- 
based  correlation  matching  method  is  employed  to  find  the  matching  points  in  Jj  and  J2.  Two 
approaches  are  possible,  as  suggested  in  [21j:  a  hierarchical  matching  method  (which  first  uses  a 
large  template  to  achieve  coarse  matching  and  then  searches  for  the  corresponding  point  around  the 
neighborhood  of  the  coarse  matching  result  with  a  small  template  to  achieve  better  localization)  or 
a  weighted  correlation  matching  method.  It  has  been  found  in  our  experiments  that  weighted  corre¬ 
lation  matching  outperforms  hierarchical  matching  not  only  computationally  but  also  in  accuracy. 
For  two  points  (x,y)T  €  I\  and  (x, y)T  6  h,  define  the  similarity  measure  as  [21] 

E.J  7.j[9i(«  +  *1 V  +  j)  ~  Mi][92(x  +  i,y  +  j)  ~  m] 


^sis2(z>0;z,y)  = 


where 


f'Li.j t ij[9i(*  +  i,y  +  j)  -  mi 32v/Ei7 7.;[02(x  +  i,y  +  j)  -  M2]2 


Mi  =  4£)0i  (*  +  *,y  +  j) 


•  •  4 


(15) 


3*2 


jf^92(x  +  i,y  +  j) 


•j 


Here  N  is  the  number  of  pixels  in  the  template  fi,  7 ,j  is  the  weight  associated  with  points  (x  +  i,y+ 
j)T  and  (x  +  i,y  +  j)T ,  and  g\  and  <?2  are  the  intensity  values  of  the  pixels  in  I\  and  /2  respectively. 

For  the  point  (x  +  i,y  +  j)T  or  (x  -f-  t,  y  +  j)T  in  the  template,  we  define  its  distance  from  the 
center  of  the  template  as 

max(|i|,|j|) 


In  order  to  achieve  better  localization,  weights  are  assigned  to  pixels  based  on  their  distances  from 
the  center  of  the  template.  In  addition,  contributions  from  points  of  the  same  distances  (i.e.  the 
summation  of  the  corresponding  weight  coefficients)  are  restricted  to  be  the  same  for  different  levels. 
Therefore,  we  choose  the  weights  as  follows: 


7  ij  =  < 


(»,j)  =  (0,0) 


[  8max(|>|,|j|)  (*’.?)  7M°>°) 

where  c  is  a  constant  and  is  chosen  to  account  for  the  relative  weights  at  different  levels.  Once  the 
weights  have  been  chosen,  the  matching  point  for  ( x,y)T  can  be  found  by  searching  over  a  small 
region  in  /2  for  the  point  which  has  maximal  value  of  the  similarity  measure  (13)  with  (x,  y)T. 


It  is  clear  that  the  above  correlation  matching  method  can  only  match  a  grid  point  in  I\  with 
another  grid  point  in  /2.  Since  the  predicted  location  of  the  corresponding  point,  (x'(fc  +  l|fc),  y'(k  + 
l|fc))T,  usually  does  not  coincide  with  a  grid  point,  the  matching  points  of  its  four  nearest  neighbors, 
(xn,  3/!! )x, (x12, 3/i2)T7 (^217 2/21)^  and  (x22,  j/22)t,  are  first  found  using  the  correlation  matching 
method,  where 


{Xn,yn)  =  ([x'(fc  +  l\k)} ,  [y'(A:  +  l|fc)j) 

0*72, 3/12)  =  ([x’(k  +  l|*)j ,  +  l|fc)]  +  1) 

{x2i,V2i)  =  ([x'(fc  +  l|fc)]  +  1 ,  [y'(k  +  1|*)]) 

(*22,3/22)  =  ([x'(fc  -HI*)]  +  1  ,  [y'(k+  l|fc)]+  1) 

and  an  interpolation  scheme  is  then  applied  to  obtain  the  corresponding  point. 


(16) 


D.  Occlusion  and  Perspective  Distortion  Verification 

Due  to  the  3-D  motion  of  the  camera,  there  may  exist  severe  perspective  projection  distortion 
between  two  windows  I\  and  J2,  as  well  as  occlusion  of  feature  points.  Either  case  is  likely  to 
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introduce  large  errors  when  correlation  matching  is  used.  A  scheme  to  exclude  possible  wrong 
matches  for  the  four  neighboring  pixels  is  employed  before  continuing  the  tracking  process. 

For  the  similarity  measure  i'gil2  defined  in  (13),  it  has  been  shown  that  [21] 


and  the  following  equality  holds: 


1,  if  gi(x  +  i,y  +  j)-n  1  =  g2{x  +  i,y  +  j )  -  H2 
-1,  if  gi(x  +  i,y  +  j)-m  =  ~[g2(x  +  i,y  +  j)  -  fi2] 


vd,j)en 


If  the  perspective  projection  distortion  between  two  windows  is  large  or  if  occlusion  occurs,  it  is  likely 
that  the  similarity  measures  corresponding  to  the  four  neighboring  pixels  will  be  small.  A  threshold, 
say  TH.  is  set  to  account  for  wrong  matches.  Denote  the  matches  resulting  from  correlation 
matching  for  the  four  neighboring  pixels  by  (xn,  yn;  xn,  yn),  (xi2,yu:  xi2,yl2),  (x21,  y2l\ x2X,  y2l) 
and  (x22,  y22:  x22,  y22).  Three  cases  are  considered  in  the  following. 

Case  1:  More  than  two  matching  pairs  have  similarity  measures  less  than  TH. 

In  this  case,  the  possibility  that  the  feature  point  is  occluded  in  I2  or  that  severe  distortion 
exists  in  the  two  windows  is  very  high.  We  remove  the  feature  point  from  the  tracking  list. 

Case  2:  Three  matching  pairs  have  similarity  measures  greater  than  TH. 

Because  there  is  only  one  matching  pair  with  a  high  possibility  of  a  wrong  match,  the  feature 
point  is  assumed  to  exist  and  the  perspective  projection  distortion  between  7j  and  /2  is  considered 
not  to  be  too  severe.  An  extrapolation  scheme  is  therefore  introduced  to  correct  the  wrong  match. 
Without  loss  of  generality,  let  (x22,y22\x22,y22)  be  the  matching  pair  whose  similarity  measure  is 
less  than  TH.  Since  the  four  neighbors  of  the  feature  point  form  a  square  in  I\ ,  the  four  matching 
points  are  assumed  to  form  a  parallelogram  in  I2.  The  matching  point  for  (x22.  y22)T  is  then 


recalculated  bv 


and  the  tracking  process  continues. 


x22  =  X12  +  x2i  -  x33 

yi2  =  yn  +  i>2\  -  yn 


Case  3:  All  the  four  matching  pairs  have  similarity  measures  greater  than  TH. 

In  this  case,  the  four  similarity  measures  show  high  similarity  between  the  two  windows.  There¬ 
fore  the  outputs  from  the  correlation  matching  are  considered  reliable  and  tracking  continues. 
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E.  Subpixel  Accuracy  Correction 


a) 
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For  tracking  over  a  sequence  of  images,  the  quantization  errors  in  matching  grid  points  to  grid  points 
accumulate,  resulting  in  deviations  in  the  trajectories.  In  order  to  reduce  accumulated  errors,  the 
matches  obtained  from  the  correlation  matching  method  need  to  be  refined  before  applying  the 
location  interpolation  scheme  to  find  the  corresponding  points. 

Since  a  good  initial  match  has  been  obtained,  the  image  differential  method  [12,  21]  provides 
a  simple  and  effective  way  to  achieve  subpixel  accuracy  matching.  Assuming  that  (x,y)T  £  I\  is 
matched  to  (x,y)T  £  I2  and  the  original  intensity  function  g2(x,  y)  is  offset  by  (Sx,Sy)  relative  to 
the  interpolated  intensity  function  gi(x,  y),  i.e. 


9i(x<y)  =  9i(x  -  6x,y-  6y)  (20) 

Then  the  difference  in  g\{x,y)  and  g2(x,  y)  can  be  written  as  [12,  21] 

d(x,y;x,y)  =  gi(x,y)  -  g2(x,y) 

=  9i(x,y)~  gi(x  -  Sx,y  -  6y)  (21) 

dgi(x,y)c  ,  dgi(x,y)c 
- + 

For  each  matching  pair  (x,  y:  x,  y),  suppose  that  the  above  approximation  holds  for  a  small  neigh¬ 
borhood  of  size  (2^  +  1)  x  (2w\f  +  1);  then  a  set  of  equations  can  be  formed  as  follows: 


D  =  GA 


(22) 


where 


1  d(x  —  Wi,  y  —  Wd'.x  -  Wd,y~  Wd)  ^ 
d(x,y\x,y) 

K  d(x  +  Wd,  y  +  wd;  x  +  wd,  y  +  u>d)  j 


/  dgi(x-wd,y-Wd) 
dx 

Ssifx-u id,y-vd)  \ 
dy 

G  = 

dgi(x,y) 

dx 

dy 

9  3l(*+UM.V+Wd) 

\  dx 

8v  / 

(23) 


(24) 
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and 


The  offset  vector  A  is  then  the  least  square  solution  of  (22)  and  can  be  obtained  as 

(GTG)~1GTD  (26) 

Thus,  ( x,y)T  is  matched  to  (x  +  6 x,y  +  6y)T  which  achieves  subpixel  accuracy  matching.  In  our 
experiments,  a  neighborhood  with  uj  =  3  was  employed. 


F.  Location  Interpolation 


After  the  matching  points  of  the  four  nearest  neighbors  have  been  found  to  subpixel  accuracy,  the 
point  corresponding  to  the  feature  point  is  obtained  by  the  location  interpolation  scheme  described 
below  [21]. 

For  each  matching  pair  ( x ,  y;  x,  y),  assume  that  the  relationship  between  (x,  y)T  and  (x,  y)T  can 


be  expressed  as 


x  =  fox  +  a2y  +  foxy  +  a4 
y  =  x  +  foy  +  foxy  +  fo 


or  expressed  relative  to  the  match  of  (xn,yn)r  as 

x-iii  =  ai(x  -  Ill)  +  <*2(y-2/u)  +  <*3{x  -  Xn){y  -  yu)  +  <*4 
y-y  n  =  fo(x  -  xu)  +  fo(y  -  yn)  +  fo{x  -  xn)(y  -  yu)  +  fo 

Then  from  the  four  matching  pairs  and  the  relationships  between  the  four  nearest  neighbors  in 
(16),  the  coefficients  a,  and  /3„  i  =  1, . . .,4  are  [21] 


Ql  =  X21  -  Xu 

0i  =  hi  ~  tin 


a2  =  X12  -  in 

02  =  til2  ~  Sill 


03  =  122  +  ill  -  Il2  “  ^21 
03  =  V22  +  yn  -  yi2  -  tin 


and 

q4  =  0 
04  =  0 

Substituting  (2&-32)  into  (28),  the  feature  point  location  interpolation  formula  can  be  written  as 


(32) 


’ 

X  =  in  +  (i2l  -  ^ll)ei  +  (it2 -2ll)fy  +  (222  +  ill  -  ^12  -  ^2l)Cxfy 

it  =  »n  +  ($21  -  »u)«x  +  (»i2  -  yiiK  +  (y22  +  »n  -  yu  -  y2i)<*fv 
ex  =  x  -  xu 
(y  =  y-yn 


(33) 


It  has  been  shown  in  [21]  that  if  the  quadrangle  formed  by  (in,  $n)r,  (ii2,  yi2)r,  (z2i>  $2i)r  and 
(i22,  $22 )T  is  convex,  the  corresponding  point  interpolated  from  (33)  is  also  inside  the  quadrangle. 

This  completes  the  task  of  identifying  the  point  corresponding  to  a  feature  point  in  the  (fc  +  l)st 
image.  The  EKF  can  now  use  this  matching  point  as  the  measurement  to  update  the  corresponding 
state  vector  as  well  as  the  covariance  matrix,  and  the  algorithm  is  ready  to  continue  tracking  the 
feature  point  to  the  next  time  instant. 


2.4  Inclusion  of  New  Features 

When  tracking  a  feature  point  over  a  long  sequence,  it  is  possible  that  the  point  moves  out  of  the 
field  of  view  or  is  occluded  by  the  other  objects  after  some  time  instant.  This  results  in  a  decrease 
in  the  number  of  feature  points  being  tracked.  In  addition,  because  of  the  motion  of  the  camera, 
features  not  detected  at  earlier  instants  are  likely  to  be  identified  later.  It  is  therefore  necessary  to 
consider  a  strategy  for  including  new  feature  points  extracted  from  the  successive  frames.  In  our 
work,  an  extracted  point  is  considered  to  be  a  new  feature  point  if  it  does  not  correspond  to  any 
point  currently  being  tracked.  Furthermore,  instead  of  initiating  tracks  for  all  new  feature  points, 
which  results  in  a  rapid  growth  in  the  number  of  feature  points,  validation  gates  as  defined  in  (7) 
are  employed  to  screen  the  newly  detected  feature  points.  A  new  feature  point  is  added  to  the 
tracking  list  if  it  lies  outside  all  the  validation  gates  associated  with  the  feature  points  currently 
being  tracked. 

For  example,  in  Figure  2,  eight  validation  gates  which  correspond  to  the  eight  feature  points  in 
the  tracking  list  are  formed.  Another  nine  feature  points  extracted  from  the  current  frame  are  also 
shown.  Since  only  z\ ,  and  zg  do  not  fail  into  any  validation  gate,  they  are  added  to  the  tracking 
list. 
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Figure  2:  The  inclusion  of  new  feature  points:  eight  validation  gates  and  nine  newly  extracted 
feature  points 

In  other  words,  the  newly  extracted  feature  points  are  not  tracked  if  they  are  too  close  to  the 
currently  tracked  feature  points.  This  is  particularly  useful  for  estimating  the  motion  of  the  camera 
since  uniformly  distributed  points  are  more  likely  to  cancel  out  the  effects  of  imperfect  knowledge 
of  the  camera  parameters  such  as  the  imaging  center  and  the  field  of  view.  The  proposed  scheme 
not  only  takes  into  account  the  decrease  in  the  number  of  feature  points  on  the  tracking  list  but 
also  prevents  the  number  of  feature  points  from  growing  too  fast  since  as  the  number  of  feature 
points  on  the  tracking  list  increases,  the  image  region  covered  by  the  validation  gates  also  grows. 

3  Experimental  Results 

In  this  section,  tracking  results  are  presented  for  four  real  image  sequences  taken  by  cameras 
undergoing  different  types  of  motion.  For  each  sequence,  in  addition  to  the  trajectory  termination 
criterion  in  Section  2.3,  depending  on  the  size  of  the  area  correlation  mask,  feature  points  too  close 
to  the  image  boundary  are  removed.  A  tracking  list  which  contains  the  matching  points  as  well 
as  the  new  feature  points  is  created  and  updated  at  every  frame.  For  visual  purposes,  only  the 
trajectories  of  the  feature  points  tracked  from  the  first  frame  as  well  as  the  new  feature  points 
added  to  the  tracking  list  at  subsequent  time  instants  are  displayed.  The  dynamic  behavior  of  the 
algorithm  is  shown  in  a  table  which  lists  the  number  of  feature  point  trajectories  being  maintained 
or  removed  from  the  tracking  list  and  the  number  of  new  points  selected  from  every  frame. 
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3.1  UMASS  PUMA2  Sequence 

The  first  sequence  is  known  as  the  UMASS  PUMA2  sequence;  it  consists  of  thirty  256  x  256  frames 
The  camera  is  connected  to  the  end  of  a  PUMA  robot  arm  and  rotates  about  a  rotation  center 
which  is  close  to  the  image  center.  The  tracking  results  for  a  set  of  manually  selected  points  which 
was  used  in  [18]  to  estimate  the  motion  of  the  camera  are  shown  in  Figure  3.  Figure  4  shows  the 
trajectories  for  a  set  of  feature  points  automatically  extracted  from  the  first  frame  by  the  algorithm 
reported  in  [14];  the  trajectories  are  shown  up  to  the  7th,  13th,  19th,  25th  and  30th  frames.  The 
motion  parameters  corresponding  to  the  two  points  marked  in  Figure  4(a)  computed  by  the  EKF 
are  displayed  in  Figure  5.  Note  that  the  coordinate  system  illustrated  in  Figure  1  has  been  changed, 
with  the  x-axis  pointing  downward  instead  of  upward  for  convenience.  Since  the  rotation  center 
does  not  coincide  with  the  image  center,  a  nonzero  translational  velocity  is  observed  for  both  points. 
The  number  of  feature  points  being  tracked  varies  with  time,  as  shown  in  Table  1.  The  new  feature 
points  extracted  by  the  feature  extraction  algorithm  from  frames  3,  7,  13,  19,  25  and  30  are  shown 
in  Figure  6,  in  addition  to  the  labeled  points  which  were  added  to  the  tracking  list  at  different  time 
instants.  As  seen  in  Table  1,  the  algorithm  for  adding  new  points  to  the  tracking  list  efficiently 
maintains  the  number  of  points  on  the  list. 


Table  1:  The  number  of  feature  points  in  the  tracking  list  for  the  UMASS  PUMA2  Sequence 


frame  number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

#  of  points  in  the  list 

0 

21 

19 

28 

31 

35 

38 

40 

41 

40 

41 

43 

43 

44 

46 

#  of  points  extracted 

23 

23 

22 

26 

24 

27 

29 

29 

28 

28 

25 

16 

19 

20 

24 

#  of  new  points 

23 

0 

9 

4 

5 

5 

3 

3 

4 

4 

3 

1 

2 

2 

5 

frame  number 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

#  of  points  in  the  list 

49 

48 

49 

48 

50 

49 

51 

50 

53 

50 

52 

52 

53 

54 

53 

#  of  points  extracted 

18 

21 

20 

21 

20 

21 

20 

19 

23 

25 

24 

24 

27 

24 

15 

#  of  new  points 

2 

3 

1 

3 

1 

3 

0 

4 

0 

3 

0 

3 

2 

1 

1 
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3.2  Coke  Can  Sequence 

The  second  sequence  is  the  Coke  Can  Sequence,  in  which  the  camera  is  approaching  the  scene,  with 
the  Focus  of  Expansion  (FOE)  located  on  the  coke  can.  Fifteen  frames  chosen  from  the  densely 
sampled  sequence,  spaced  10  frames  apart,  axe  used.  The  original  512  x  512  images  are  down- 
sampled  to  256  x  256  before  applying  the  algorithm.  The  resulting  trajectories  from  the  first  frame 
to  the  5th,  10th  and  15th  frames  are  shown  in  Figure  7.  The  estimated  motion  parameters  of  the 
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(c.i  Trajectories  up  to  the  13th  frame  (d)  Trajectories  up  to  the  19th  frame 


(e)  Trajectories  up  to  the  2-V1'  frame  if)  Trajectories  up  to  the  3()ltl  frame 


figure  :{;  I  ra  jertories  for  the  (MASS  IM’MA‘2  Sequence  !  manually  selected  point  si 


( a  i  Feature  points  in  the  first  frame  i  b)  Trajectories  up  to  the  seventh  frame 


ic)  Trajectories  up  to  the  13th  frame  (d)  Trajectories  up  to  the  19th  frame 


te  i  Trajectories  up  to  the  'TV1'  frame  (f  i  Trajectories  up  to  the  3011'  frame 


Figure  I:  I  rajectories  for  the  FMASS  IMMA2  Sequence  i  automat  ically  extracted  points  i 
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(a)  Feature  points  in  the  third  frame  (b)  Feature  points  in  the  seventh  frame 


Ic)  Feature  points  in  the  13th  frame  (d)  Feature  points  in  the  19,h  frame 


Figure  (>:  Automatically  detected  new  feature  point;-  in  the  FMASS  PIMA2  Sequence 


•  •  •  • . _  •  •  t  •  • 


two  points  marked  in  Figure  7(a)  are  displayed  in  Figure  8.  As  seen  from  the  figures,  because  of 
the  pure  translation  of  the  camera,  the  trajectories  of  the  feature  points  diverge  from  the  FOE  and 
are  well  described  by  the  motion  model.  Table  2  lists  the  number  of  tracked  feature  points  at  each 
time  instant.  The  new  feature  points  added  at  the  2nd,  5th,  10th  and  15th  frames  are  marked  in 
Figure  9. 


(a)  Feature  points  in  the  first  frame  (b)  Trajectories  up  to  the  fifth  frame 


(c)  Trajectories  up  to  the  tenth  frame  (d)  Trajectories  up  to  the  15th  frame 


Figure  7:  Trajectories  for  the  Coke  Can  Sequence 

3.3  Rocket  ALV  Sequence 

The  third  sequence  is  the  30-frame  UMASS  Rocket  ALV  Sequence.  Again,  the  512  x  512  images  are 
down-sampled  to  256  x  256  before  applying  the  algorithm.  In  this  sequence,  the  camera  is  mounted 


(C) 


(d) 


Figure  8:  Motion  parameters  for  the  Coke  Can  Sequence  computed  by  the  Kalman  Filter 


(c)  Feature  points  in  the  tenth  frame  (d)  Feature  points  in  the  15th  frame 


Figure  9:  Automatically  detected  new  feature  points  in  the  Coke  Can  Sequence 
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Table  2:  The  number  of  feature  points  in  the  tracking  list  for  the  Coke  Can  Sequence 


frame  number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

#  of  points  in  the  list 

0 

12 

13 

16 

18 

20 

22 

22 

22 

26 

27 

28 

28 

28 

28 

#  of  points  extracted 

13 

12 

15 

12 

13 

17 

15 

15 

16 

15 

14 

17 

15 

17 

14 

#  of  new  points 


13  1 


1  1 


0  1 


on  the  vehicle  which  appears  to  be  moving  along  a  straight  line  to  the  left  and  into  the  image 
plane  with  almost  no  rotation.  Due  to  the  uneven  terrain,  the  motion  of  the  camera  is  not  smooth. 
The  trajectories  for  the  feature  points  up  to  the  7th,  13th,  19th,  25th  and  30th  frames  are  shown  in 
Figure  10.  Figure  11  displays  the  motion  trajectories  corresponding  to  the  two  points  marked  in 
Figure  10(a).  The  uneven  motion  of  the  camera  results  in  the  motion  trajectories  in  Figure  11  being 
more  jerky  than  those  in  Figure  5  and  Figure  8.  Table  3  lists  the  number  of  feature  points  on  the 
tracking  list.1  The  extracted  feature  points  as  well  as  the  new  points  selected  by  the  proposition 
in  Section  2.4  from  the  2nd,  7th,  13th,  19th,  25th  and  30th  frames  are  shown  in  Figure  12.  As  seen 
from  the  figures,  many  feature  points  move  out  of  the  field  of  view  in  the  first  few  frames.  It  is 
therefore  necessary  to  include  new  feature  points  when  they  become  available.  Also,  it  is  apparent 
from  the  sequence  that  the  vehicle  has  an  abrupt  change  in  heading  direction  at  the  16th  and  20th 
frames,  but  the  algorithm  still  keeps  tracking  most  of  the  feature  points. 


Table  3:  The  number  of  feature  points  in  the  tracking  list  for  the  UMASS  Rocket  ALV  Sequence 


frame  number 


8  9  10  11  12  13  14  15 


#  of  points  in  the  list  0  19  19  23  22  22  19  16  19  21  24  22  22  23  21 


#  of  points  extracted  |  25  20  16  12  14  13  16  16  13  18  14  21  16  17  14 


#  of  new  points 


25  4 


1  1 


frame  number 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

#  of  points  in  the  list 

19 

19 

18 

18 

21 

19 

24 

25 

27 

29 

28 

26 

27 

25 

25 

#  of  points  extracted 

19 

19 

19 

18 

22 

15 

17 

16 

18 

18 

17 

16 

19 

19 

18 

#  of  new  points 


'The  feature  points  on  the  clouds  are  manually  removed  from  the  tracking  list  because  of  their  nonrigid  shapes, 
and  so  are  the  feature  points  detected  at  the  bottom  of  each  image  due  to  the  strip  patterns. 
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to  i  Trajectories  up  to  the  13th  frame  id)  Trajectoies  up  to  the  19th  frame 


i  <M  I ra j<Ttorie>  up  to  the  2-V*’  frame 


(ft  Trajectories  up  to  the  dO’1’  frame 


f  igure  10:  Trajectories  for  the  Rocket  Al.Y  Sequence 


23 


* 


K 


* 


» 


» 


» 


» 


• • •  •  t  •  •  f  •  • 


(e)  Feature  points  in  the  25th  frame  (f )  Feature  points 


lil  l  11*  li  0.111* 


Figure  12:  Automat ira.ll v  detected  now  feature  points  in  tin*  Rocket  AFY  Sequence 


3.4  Martin  Marietta  R3  Sequence 


The  last  sequence  is  one  of  the  four  sequences  distributed  by  Martin  Marietta.  As  in  the  third 
sequence,  the  camera  is  mounted  on  a  vehicle  and  the  images  are  taken  when  the  vehicle  is  moving 
through  an  outdoor  environment.  The  original  sequence  consists  of  densely  sampled  images  of 
size  347  x  238;  only  thirty  frames,  five  frames  apart,  were  used  in  the  experiment.  During  the 
acquisition  of  the  images,  the  vehicle  moves  to  the  right  and  slightly  into  the  scene.  Figure  13 
shows  the  trajectories  of  a  set  of  feature  points  from  the  first  frame  to  the  7th,  13th,  19th,  25th  and 
30th  frames.  As  seen  from  the  figures,  the  points  on  the  mountain  are  fax  away  from  the  vehicle 
resulting  in  small  movements  on  the  image  plane.  The  computed  motion  parameters  of  the  two 
points  marked  in  Figure  13  are  shown  in  Figure  14.  The  nonsmooth  behavior  is  in  part  due  to 
the  uneven  terrain.  The  dynamic  inclusion  of  the  new  feature  points  is  summarized  in  Table  4. 
Figure  15  shows  the  feature  points  detected  in  the  2nd,  8th,  13th,  19th,  25th  and  30th  frames  and 
the  points  added  to  the  tracking  list. 


Table  4:  The  number  of  feature  points  in  the  tracking  list  for  the  Martain  Marietta  R3  Sequence 


frame  number 

1 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

#  of  points  in  the  list 

0 

9 

12 

18 

21 

20 

23 

22 

25 

25 

27 

27 

30 

30 

28 

#  of  points  extracted 

9 

15 

17 

18 

15 

15 

13 

14 

10 

17 

11 

15 

17 

10 

12 

#  of  new  points 

9 

4 

8 

3 

3 

0 

3 

1 

4 

0 

3 

2 

0 

2 

frame  number 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

#  of  points  in  the  list 

27 

29 

30 

31 

32 

33 

33 

33 

37 

38 

37 

37 

37 

39 

37 

#  of  points  extracted 

18 

19 

18 

11 

15 

17 

15 

17 

19 

19 

20 

15 

16 

10 

17 

#  of  new  points 

2 

1 

4 

1 

2 

2 

1 

4 

1 

2 

3 

3 

5 

0 

3 

4  Conclusions 

An  algorithm  for  tracking  a  dynamic  set  of  feature  points  to  subpixel  accuracy  over  a  sequence 
of  images  has  been  presented.  In  particular,  a  simple  2-D  kinematic  motion  model  is  employed 
to  describe  feature  point  trajectories,  instead  of  a  more  complicated  3-D  model.  To  account  for 
deviations  from  the  2-D  motion  model,  the  PDAF  is  used  to  provide  initial  estimates  of  the  in-frame 
motion  parameters.  The  application  of  local  image  registration,  taking  into  account  the  varying 
depth  of  the  scene  and  compensating  for  the  motion  between  two  consecutive  frames,  reduces 
the  search  area  and  matching  errors.  In  addition,  the  inclusion  of  new  feature  points  makes  the 
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(e)  Trajectories  up  to  the  25th  frame  (f)  Trajectories  up  to  the  30th  frame 
Figure  13:  Trajectories  for  the  Martin  Marietta  R3  Sequence 
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Figure  14:  Motion  parameters  for  the  Martin  Marietta  R3  Sequence  computed  by  the  Kalman 
Filter 


(a)  Feature  points  in  the  second  frame  (b)  Feature  points  in  the  eighth  frame 


(c)  Feature  points  in  the  13th  frame  (d)  Feature  points  in  the  19th  frame 


(e)  Feature  points  in  the  25th  frame  (f)  Feature  points  in  the  30th  frame 


Figure  15:  Automatically  detected  new  feature  points  in  the  Martin  Marietta  R3  Sequence 


algorithm  useful  for  the  estimation  of  the  pose  and  motion  of  the  camera. 
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Appendix  A 

We  describe  the  PDAF  in  this  appendix.  Assuming  that  the  trajectory  of  a  feature  point  over 
the  image  sequence  has  been  established  up  to  the  kth  image,  the  time-varying  behavior  of  the 
corresponding  state  vector  between  tk  and  tk+i  as  well  as  the  relationship  between  the  measurements 
and  the  state  vector  are  the  same  as  (2)  and  (4),  i.e. 

x(k  +1)  =  /[z(fc)]  +  w(k  +  1) 
z(fc+l)  =  Hxfk  +  1)  +  n(k  +  1) 

Since  the  plant  equation  is  nonlinear,  in  order  to  linearize  the  nonlinear  function  /  using  the  first 
order  Taylor  series  expansion,  the  following  matrix  is  defined: 


Then  at  tk+i,  before  taking  into  account  any  measurement,  the  PDAF  [4]  first  propagates  the  state 
vector  and  the  covariance  matrix  from  U  to  4+1  and  predicts  the  location  of  the  corresponding 
point,  z(k  +  l|k),  by  [4] 

x(k  +  l\k)  =  F[i(A:]A:)]i(k|A:) 

P(k+l\k)  =  F[i(k|fc)]/>(fc|k)F[x(fc|fc)]T  +  g(fc+l)  (34) 

i(it+l|Jfc)  =  Hx(k\k) 

Subsequently,  in  order  to  incorporate  the  information  contained  in  the  (k  -(-  l)st  image,  a  validation 
gate  constructed  based  on  the  Mahalanobis  distance  in  (7)  is  applied.  Only  the  extracted  points 
with  distance  less  than  a  threshold,  say  7,  are  considered  as  the  possible  corresponding  point  for  the 
feature  point.  Without  loss  of  generality,  assume  that  there  are  mk+ 1  points  inside  the  validation 
gate.  The  PDAF  is  ready  to  update  the  state  vector  and  the  covariance  matrix  using  the  past  and 
present  information. 
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Since  there  is  an  ambiguity  in  deciding  which  point  among  the  rrik+i  points  corresponds  to  the 
feature  point,  the  a  posteriori  probability  of  each  point  being  correct  given  the  past  information  is 
evaluated.  In  other  words,  the  association  probability  for  the  jth  point,  Zj(k  +  1),  is  defined  as 

fr(k  +  1)  =  Pr  [Oj(k  +  1)|  Z\k  +  1))  (35) 

where  Z'(k  +  1)  is  the  collection  of  all  possible  measurements  at  each  time  instant  and 

Oj(k  +  1)  =  {z.j(k  +  1)  is  the  corresponding  point} 

In  addition,  the  probability  that  none  of  the  mk+i  points  is  correct  is  considered  and  denoted  by 
0o (k  +  1).  As  above,  if  each  point  is  assumed  to  be  normally  distributed  about  z(k  +  l|fc)  with  the 
corresponding  innovation  vector  represented  by  Vj(k  +  1),  the  association  probabilities  are  shown 
to  be  the  following  [4]: 

0j(k  +1)  —  .  ,  V'^*+TT  3  ~  !'•••)  mk+i 

6+L(=i  e‘ 


0O(k  +  1)  =  .  yA+l 


where 


e:  =  exv[-\vj(k  +  l)TS(k  +  l)  1vj(k  +  1)] 

L  (2\  _  l -PdP„  (  ’ 

b  ~  (.7/  k+1  Pd 

In  (37),  Pg  and  Pj,  represent  the  a  priori  probabilities  that  an  extracted  point  falls  into  the  validation 
gate  and  that  the  corresponding  point  is  detected  by  the  feature  extraction  algorithm  respectively. 
These  prior  probabilities  are  set  to  be  the  same  for  each  feature  point  being  tracked. 

After  the  association  probabilities  are  obtained,  the  state  vector  as  well  as  the  covariance  ma¬ 
trix  are  updated  by  combining  the  information  contained  in  the  to*+1  points  with  the  predicted 
estimates  as  follows: 

K(k  +  1)  =  P(k  +  l\k)HT[H  P(k  +  life)#7,  +  R(k  +  l)]-1 
i(*+l|*+l)  =  x(k+l\k)  +  K(k+l)v(k+l)  (38) 

p(*  +  i|*+i)  =  ^0(fc  +  iW  +  i|fc)  +  [i-/?o(*  +  i)]Pc(fc  +  i|fc  +  i)  +  ^(*+i) 


where 


a(*+i)  =  E7=r/M*+i)M*+i) 

P(k  +  1)  =  K(k  +  l)E7=r  &(*  +  1  )Xj{k  +  1  )vj(k  +  1)T  -  v(k  +  1) 
2(k  +  l)T}K(k+l)T 

Pc(k+l\k+l)  =  [I-K(k+l)H]P(k  +  l\k) 
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This  completes  one  cycle  of  the  PDAF.  As  seen  in  (38),  the  uncertainties  in  the  measurements 
result  in  the  rrik+i  extracted  points  being  combined  with  different  weights  to  correct  the  predicted 
state  vector  in  (34),  and  the  uncertainties  in  the  state  estimates  axe  increased  as  shown  in  (38). 

It  is  worth  noting  that  in  the  derivation  of  the  PDAF,  all  measurements  falling  within  a  val¬ 
idation  gate  are  considered  equally  likely  to  be  the  correct  measurements.  However,  if  additional 
information  is  available  so  that  the  ambiguity  can  be  resolved  and  one  of  the  measurements,  say 
Zj ,  is  identified  as  the  correct  measurement,  its  corresponding  association  probability,  0j(k  +  1), 
should  be  set  to  1.  This  leads  to  the  well  known  EKF. 
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